Focusing on Open-Source and Free Software

Open-source software refers to software that has been developed and supported by a user community.

Although open-source software has licenses, they are typically free but require you to adhere to certain

policies when using the software. In this section, we talk about the two most popular open-source

statistical software packages: R and Python.

Open-source software

The two most popular and extensive open-source statistical programs are R and Python.

R: R is statistical software that has been developed and is maintained by the R user community. It

has two interfaces: R GUI, which looks similar to PC SAS and SPSS, and RStudio, which is an

integrated development environment (IDE). Analysts prefer to use RStudio when developing

graphical displays for the web, while R GUI is fine for most statistical work. To run R, you

download and install the base application. Then, for specified functions not included in the base

application, you install additional R packages. Like with PC SAS, in R, you import or connect to

datasets, develop and save code files to run on those datasets, and produce output you can save.

Base R, R packages, and documentation are available on the Comprehensive R Archive Network

(CRAN) server at https://cran.r-project.org.

Python: Python is an open-source programming language that is often used to analyze data. As

with R, Python is developed and maintained by its own user community and runs in a similar way.

Although you still develop code that runs against datasets in the Python environment, the Python

and R code are different. Instead of packages as in R, Python has libraries. Python is available at

www.python.org/downloads.

Students often wonder what the differences are between R and Python, and which one to learn.

They are essentially the same, although scientific disciplines have leaned toward adopting R, and

engineering disciplines have leaned toward Python. Many students find themselves easily

learning both.

Other free statistical software

Other statistical software packages are free, but they are not technically open-source — meaning they

were not developed by an open-source community, and they are not licensed the same way.

Software that performs many functions

This section provides examples of free software that performs many functions like SAS and R.

OpenStat and LazStats are free statistical programs developed by Dr. Bill Miller that use menus

that resemble SPSS. Dr. Miller provides several excellent manuals and textbooks that support both

programs. OpenStat and LazStats are available at https://openstat.info.

Epi Info was developed by the United States Centers for Disease Control to acquire, manage,

analyze, and display the results of epidemiological research. What makes it different than other